VCF-53: Fix read_variant_stats() segfault when region is empty#871
Merged
alancleary merged 4 commits intomainfrom Jan 22, 2026
Merged
VCF-53: Fix read_variant_stats() segfault when region is empty#871alancleary merged 4 commits intomainfrom
alancleary merged 4 commits intomainfrom
Conversation
These edge cases are to handle when the buffers returned by a dataset read are empty. Specifically, an array's offsets must still contain the initial 0 value even when there's no data, and a string array must always have 3 buffers, even when there's no data.
…rrow tables Previously the allele count binding would be return None and the variant stats binding would try to parse the empty buffers returned by the reader, causing a segmentation fault.
…nt() These tests ensure that both methods return an empty DataFrame when the regions are empty or don't match the constraints of the query.
Member
Author
|
Note that the CLI failure will be fixed by PR #870. |
jp-dark
approved these changes
Jan 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The issue was caused by the code that converts VCF read buffers into Arrow arrays assuming that the buffers would not be empty.
read_variant_stats()now returns an empty Arrow Table with the correct schema if the read buffers are empty. Theread_allele_count()method was updated to do the same. Previously it returnedNonewhen the read buffers are empty, which needlessly complicates downstream parsing and aggregation.